16 research outputs found

    Speaker identification and clustering using convolutional neural networks

    Get PDF
    Deep learning, especially in the form of convolutional neural networks (CNNs), has triggered substantial improvements in computer vision and related fields in recent years. This progress is attributed to the shift from designing features and subsequent individual sub-systems towards learning features and recognition systems end to end from nearly unprocessed data. For speaker clustering, however, it is still common to use handcrafted processing chains such as MFCC features and GMM-based models. In this paper, we use simple spectrograms as input to a CNN and study the optimal design of those networks for speaker identification and clustering. Furthermore, we elaborate on the question how to transfer a network, trained for speaker identification, to speaker clustering. We demonstrate our approach on the well known TIMIT dataset, achieving results comparable with the state of the art – without the need for handcrafted features

    Learning embeddings for speaker clustering based on voice equality

    Get PDF
    Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices - namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just 100 instead of 590 unrelated speakers to learn an embedding suited for clustering

    Breathing as an Input Modality in a Gameful Breathing Training App (Breeze 2): Development and Evaluation Study

    Full text link
    Background Slow-paced breathing training can have positive effects on physiological and psychological well-being. Unfortunately, use statistics indicate that adherence to breathing training apps is low. Recent work suggests that gameful breathing training may help overcome this challenge. Objective This study aimed to introduce and evaluate the gameful breathing training app Breeze 2 and its novel real-time breathing detection algorithm that enables the interactive components of the app. Methods We developed the breathing detection algorithm by using deep transfer learning to detect inhalation, exhalation, and nonbreathing sounds (including silence). An additional heuristic prolongs detected exhalations to stabilize the algorithm’s predictions. We evaluated Breeze 2 with 30 participants (women: n=14, 47%; age: mean 29.77, SD 7.33 years). Participants performed breathing training with Breeze 2 in 2 sessions with and without headphones. They answered questions regarding user engagement (User Engagement Scale Short Form [UES-SF]), perceived effectiveness (PE), perceived relaxation effectiveness, and perceived breathing detection accuracy. We used Wilcoxon signed-rank tests to compare the UES-SF, PE, and perceived relaxation effectiveness scores with neutral scores. Furthermore, we correlated perceived breathing detection accuracy with actual multi-class balanced accuracy to determine whether participants could perceive the actual breathing detection performance. We also conducted a repeated-measure ANOVA to investigate breathing detection differences in balanced accuracy with and without the heuristic and when classifying data captured from headphones and smartphone microphones. The analysis controlled for potential between-subject effects of the participants’ sex. Results Our results show scores that were significantly higher than neutral scores for the UES-SF (W=459; P<.001), PE (W=465; P<.001), and perceived relaxation effectiveness (W=358; P<.001). Perceived breathing detection accuracy correlated significantly with the actual multi-class balanced accuracy (r=0.51; P<.001). Furthermore, we found that the heuristic significantly improved the breathing detection balanced accuracy (F1,25=6.23; P=.02) and that detection performed better on data captured from smartphone microphones than than on data from headphones (F1,25=17.61; P<.001). We did not observe any significant between-subject effects of sex. Breathing detection without the heuristic reached a multi-class balanced accuracy of 74% on the collected audio recordings. Conclusions Most participants (28/30, 93%) perceived Breeze 2 as engaging and effective. Furthermore, breathing detection worked well for most participants, as indicated by the perceived detection accuracy and actual detection accuracy. In future work, we aim to use the collected breathing sounds to improve breathing detection with regard to its stability and performance. We also plan to use Breeze 2 as an intervention tool in various studies targeting the prevention and management of noncommunicable diseases

    Mobile Stress Management Applications: An Affordance-Theoretic Perspective on the Adoption and Use

    Full text link
    Chronic stress is a burden on mental and physical health. Despite the development and effectiveness of mobile stress management applications, their adoption and continued use remain low. Given that research revealed systematic differences in usage behavior among user types, we aim to investigate what drives these differences. We extend the affordance perspective and argue that accounting for psychological needs, actualized affordances, and actualization costs across different user types provides a deeper understanding of the factors driving the adoption and use of mobile stress management applications. The qualitative interview study of our mixed-methods study reveals eight affordances, eight actualization costs, and initial evidence for systematic differences among the user types. The quantitative questionnaire study will uncover the psychological needs, actualized affordances, and perceived actualization costs of the six user types. This work contributes a new theoretical perspective to overcome the gap in the adoption and usage of mobile stress management applications

    Mobile Stress Management Applications: An Affordance-Theoretic Perspective on the Adoption and Use

    Get PDF
    Chronic stress is a burden on mental and physical health. Despite the development and effectiveness of mobile stress management applications, their adoption and continued use remain low. Given that research revealed systematic differences in usage behavior among user types, we aim to investigate what drives these differences. We extend the affordance perspective and argue that accounting for psychological needs, actualized affordances, and actualization costs across different user types provides a deeper understanding of the factors driving the adoption and use of mobile stress management applications. The qualitative interview study of our mixed-methods study reveals eight affordances, eight actualization costs, and initial evidence for systematic differences among the user types. The quantitative questionnaire study will uncover the psychological needs, actualized affordances, and perceived actualization costs of the six user types. This work contributes a new theoretical perspective to overcome the gap in the adoption and usage of mobile stress management applications

    Development of a digital biomarker and intervention for subclinical depression: study protocol for a longitudinal waitlist control study

    Full text link
    Background Depression remains a global health problem, with its prevalence rising worldwide. Digital biomarkers are increasingly investigated to initiate and tailor scalable interventions targeting depression. Due to the steady influx of new cases, focusing on treatment alone will not suffice; academics and practitioners need to focus on the prevention of depression (i.e., addressing subclinical depression). Aim With our study, we aim to (i) develop digital biomarkers for subclinical symptoms of depression, (ii) develop digital biomarkers for severity of subclinical depression, and (iii) investigate the efficacy of a digital intervention in reducing symptoms and severity of subclinical depression. Method Participants will interact with the digital intervention BEDDA consisting of a scripted conversational agent, the slow-paced breathing training Breeze, and actionable advice for different symptoms. The intervention comprises 30 daily interactions to be completed in less than 45 days. We will collect self-reports regarding mood, agitation, anhedonia (proximal outcomes; first objective), self-reports regarding depression severity (primary distal outcome; second and third objective), anxiety severity (secondary distal outcome; second and third objective), stress (secondary distal outcome; second and third objective), voice, and breathing. A subsample of 25% of the participants will use smartwatches to record physiological data (e.g., heart-rate, heart-rate variability), which will be used in the analyses for all three objectives. Discussion Digital voice- and breathing-based biomarkers may improve diagnosis, prevention, and care by enabling an unobtrusive and either complementary or alternative assessment to self-reports. Furthermore, our results may advance our understanding of underlying psychophysiological changes in subclinical depression. Our study also provides further evidence regarding the efficacy of standalone digital health interventions to prevent depression. Trial registration Ethics approval was provided by the Ethics Commission of ETH Zurich (EK-2022-N-31) and the study was registered in the ISRCTN registry (Reference number: ISRCTN38841716, Submission date: 20/08/2022)

    Breathing as an Input Modality in a Gameful Breathing Training App (Breeze 2): Development and Evaluation Study

    No full text
    Background: Slow-paced breathing training can have positive effects on physiological and psychological well-being. Unfortunately, use statistics indicate that adherence to breathing training apps is low. Recent work suggests that gameful breathing training may help overcome this challenge. Objective: This study aimed to introduce and evaluate the gameful breathing training app Breeze 2 and its novel real-time breathing detection algorithm that enables the interactive components of the app. Methods: We developed the breathing detection algorithm by using deep transfer learning to detect inhalation, exhalation, and nonbreathing sounds (including silence). An additional heuristic prolongs detected exhalations to stabilize the algorithm’s predictions. We evaluated Breeze 2 with 30 participants (women: n=14, 47%; age: mean 29.77, SD 7.33 years). Participants performed breathing training with Breeze 2 in 2 sessions with and without headphones. They answered questions regarding user engagement (User Engagement Scale Short Form [UES-SF]), perceived effectiveness (PE), perceived relaxation effectiveness, and perceived breathing detection accuracy. We used Wilcoxon signed-rank tests to compare the UES-SF, PE, and perceived relaxation effectiveness scores with neutral scores. Furthermore, we correlated perceived breathing detection accuracy with actual multi-class balanced accuracy to determine whether participants could perceive the actual breathing detection performance. We also conducted a repeated-measure ANOVA to investigate breathing detection differences in balanced accuracy with and without the heuristic and when classifying data captured from headphones and smartphone microphones. The analysis controlled for potential between-subject effects of the participants’ sex. Results: Our results show scores that were significantly higher than neutral scores for the UES-SF (W=459; P<.001), PE (W=465; P<.001), and perceived relaxation effectiveness (W=358; P<.001). Perceived breathing detection accuracy correlated significantly with the actual multi-class balanced accuracy (r=0.51; P<.001). Furthermore, we found that the heuristic significantly improved the breathing detection balanced accuracy (F1,25=6.23; P=.02) and that detection performed better on data captured from smartphone microphones than than on data from headphones (F1,25=17.61; P<.001). We did not observe any significant between-subject effects of sex. Breathing detection without the heuristic reached a multi-class balanced accuracy of 74% on the collected audio recordings. Conclusions: Most participants (28/30, 93%) perceived Breeze 2 as engaging and effective. Furthermore, breathing detection worked well for most participants, as indicated by the perceived detection accuracy and actual detection accuracy. In future work, we aim to use the collected breathing sounds to improve breathing detection with regard to its stability and performance. We also plan to use Breeze 2 as an intervention tool in various studies targeting the prevention and management of noncommunicable diseases.ISSN:2291-927

    Breeze, ein spielerisches Biofeedback Atemtraining für das Smartphone: Physiologische Reaktionen und subjektive Einschätzungen aus einem Labor- und Online-Experiment

    No full text
    Hintergrund: Langsames Atmen hat eine positive Wirkung auf die Herzfunktion und auf das psychische Wohlbefinden. Daher werden entsprechende Atemübungen oft bei chronischen Krankheiten empfohlen; sie werden allerdings aus verschiedenen Gründen nur von bestimmten Personengruppen ausgeübt und haben somit eine eingeschränkte Reichweite und Wirkung. Ziel: Die Breeze App verfolgt das Ziel, die Reichweite von Atemübungen mit einem spielerischen und skalierbaren Biofeedback-Ansatz zu erhöhen. Methode: Grundlage der Atemübung Breeze ist die Erkennung der Atmung mit dem Mikrofon des Smartphones, um damit beim Ausatmen «Rückenwind» für ein virtuelles Segelboot zu erzeugen und es somit zu beschleunigen. Entspricht der Atmungs-Zyklus einem validierten Muster (z.B. 4s Einatmung, 2s Ausatmung und 4s Pause), kann mit dem Segelboot, welches in Echtzeit auf dem Bildschirm des Smartphones dargestellt wird, die grösste Reisedistanz zurückgelegt werden. Es wurden Labor- und Online-Experimente durchgeführt, um Breeze hinsichtlich physiologischer Effekte und subjektiver Einschätzungen bei erwachsenen Personen zu evaluieren. Ergebnisse: Im Labor (N=16) konnte gezeigt werden, dass Breeze nicht nur zu einer Steigerung der Herzfrequenzvariabilität geführt hat (p<.001), sondern auch gegenüber einer validierten Atemübung ohne spielerischen Ansatz von 14 (87.5%) Personen präferiert wurde. Ein Online-Experiment mit Teilnehmenden, welche im Schnitt nur wenig bis gar keine Erfahrung mit Atemübungen hatten, zeigte darüber hinaus, dass die wahrgenommene Entspannung durch Breeze (N=88) mit der einer validierten Atemübung (N=82) vergleichbar ist und 51 (58.0%) Personen Breeze im Alltag nutzen würden. Zusammenfassung: Breeze hat mit seinem spielerischen Ansatz das Potential, die Reichweite von Atemübungen zu erhöhen, was insbesondere für das Selbstmanagement bei chronischen Krankheiten relevant sein kann

    Breeze: Smartphone-based Acoustic Real-time Detection of Breathing Phases for a Gamified Biofeedback Breathing Training

    No full text
    Slow-paced biofeedback-guided breathing training has been shown to improve cardiac functioning and psychological well-being. Current training options, however, attract only a fraction of individuals and are limited in their scalability as they require dedicated biofeedback hardware. In this work, we present Breeze, a mobile application that uses a smartphone's microphone to continuously detect breathing phases, which then trigger a gamified biofeedback-guided breathing training. Circa 2.76 million breathing sounds from 43 subjects and control sounds were collected and labeled to train and test our breathing detection algorithm. We model breathing as inhalation-pause-exhalation-pause sequences and implement a phase-detection system with an attention-based LSTM model in conjunction with a CNN-based breath extraction module. A biofeedback-guided breathing training with Breeze takes place in real-time and achieves 75.5% accuracy in breathing phases detection. Breeze was also evaluated in a pilot study with 16 new subjects, which demonstrated that the majority of subjects prefer Breeze over a validated active control condition in its usefulness, enjoyment, control, and usage intentions. Breeze is also effective for strengthening users' cardiac functioning by increasing high-frequency heart rate variability. The results of our study suggest that Breeze could potentially be utilized in clinical and self-care activities.ISSN:2474-956
    corecore